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PROVISIONAL SPECIFICATION FOR THE INVENTION ENTITI FD : 

"APPARATUS AND SYSTEM FOR CLASSIFYING AND CONTROL 
ACCESS TO INFORMATION" 



This invention is described in the following statement: 



Apparatus and system for classifying and control access to information 
TECHNICAL FIELD OF THE INVENTION 

THIS INVENTION relates to apparatus and system for classifying 
information on communications network and in particular but not limited to 
apparatus and system for classifying content servers and for selectively controlling 
access to classified content servers. 

BACKGROUND OF THE INVENTION 

The phenomenon growth of information technology has allowed many 
people to have access to diverse information on communications networks. The 
Internet in particular allows fetching of information from any cooperating 
computers or content servers located in different parts of the world by simply 
clicking references to the information. As the number of accessible computers or 
content servers and the amount of information over the communications network 
grow daily it becomes increasing difficult to classify them manually. 

Known systems for controlling the types of information accessible on a 
network rely on comparing a requested destination with those on p re-determined 
Access Control Lists (ACL) or on word matching to determine whether to allow 
or deny access. This approach can be applied at the client node prior to 
•requesting the information or on any suitably intelligent network device capable 
of intercepting the request or subsequent reply prior to it reaching the requester. 
For example, in the case of an Internet browser running on a PC or work station, 
a request is made for an Internet resource such as a web site. Software monitoring 
such requests on the PC can be configured to scan a pre-determined list of site 
add resses for a match. If found, access to the site may be denied and a suitable 



message displayed informing the user. Alternatively, the request may be allowed 
to proceed, but as data is received from the site it is scanned for any of a set of 
pre-determined words, word fragments or phrases. If a match is found the site is 
not displayed but instead replaced with a suitable message. Typically, this type of 
control software is installed on a PC or work station which does not have 
particularly strict access privileges. The control software can be easily removed, 
disabled or otherwise circumvented and thereby defeating the control system. 

A network device capable of intercepting the request or reply, such as a 
proxy server, may perform similar actions using the same methods of web site 
matching. This is usually maintained by a network administrator with strict access 
rights. Also, any clients that must pass through the device in order to access the 
network can have content control enforced. This allows content control of 
multiple clients from one central point. 

While these known systems do provide some access control abilities, there 
are several disadvantages. A system based on word or phrase matching can only 
match text and it therefore would allow access to undesired information 
comprising graphic images. Also, a single word may match a broad range of sites 
with quite different classes of information. As an example, when the word "sex" 
is used. to match pornographic sites the system would also block access to other 
sites providing non offensive information such as articles on biology. 

A system based on access control lists is much more selective. Access 
would only be denied to sites contained in the lists. While a suitably large list 
could bar access to a great deal of undesirable information it is difficult to keep 



up to date due to the rapid increase in the number of new sites and removal of 
sites. 

The above systems also do not lend themselves to adaptation to other 
network protocols and services such as interactive chat, streaming video, email or 
encrypted data streams. Extending to different languages also poses a problem for 
globalisation of these systems. 

OBJECT OF THE INVENTION 

An object of the present invention is to alleviate or to reduce to a certain 
degree one or more of the above disadvantages. 

Another object of the present invention is provide an apparatus/system for 
classifying user profiles. 

SUMMARY OF THE INVENTION 
In one aspect therefor the present invention resides in an apparatus for 
classifying information on communications network. The apparatus comprises 
means for obtaining one or more transmission characteristics of information on a 
path of said communications network, analysing means for predicting a 
classification of said information based on said one or more transmission 
characteristics. 

In a second aspect therefor the present invention resides in an apparatus' 
for classifying content servers accessible on communications network. The 
apparatus comprises means for obtaining one or more transmission characteristics 
of information provided by any of said content servers on a path of said 
communications network, analysing means for predicting a classification of said 
information based on said one or more transmission characteristics. 



In a third aspect therefor the present invention resides in a computer 
program for classifying information on communications network. The program 
comprises means for obtaining one or more transmission characteristics of 
information on a path of said communications network, analysing means for 
predicting a classification of said information based on said one or more 
transmission characteristics. 

In a fourth aspect therefor the present invention resides in a computer 
program for classifying content servers accessible on communications network. 
The apparatus comprises means for obtaining one or more transmission 
characteristics of information provided by any of said content servers on a path 
of said communications network, analysing means for predicting a classification 
of said information based on said one or more transmission characteristics. 

In a fifth aspect therefor the present invention resides in an 
apparatus/computer program for classifying user profiles of users accessing 
information or content terminals on communications network. The 
apparatus/computer program comprises means for obtaining one or more 
transmission characteristics of information or information provided by any of said 
content servers, on a path of said communications network, analysing means for 
predicting a classification of said information or content server based on said one 
or more transmission characteristics, and means for classifying user profile in 
accordance with the predicted classification. 

The above invention may also comprise means for storing said one or more 
transmission characteristics. 



In order that the present invention can be more readily understood and be 

put into practical effect reference will now be made to the accompanying 

drawings which illustrate one preferred embodiment of the invention and wherein: 

BRIEF DESCRIPTION OF THE DRAWING 

Figure 1 is a schematic diagram of the apparatus according to the 
invention; 

Figure 2 is a table of selected data of captured packets of a search engine 
using the apparatus shown in Figure 1; 

Figure 3 is a partial table of selected data of captured packets of a news 
web site using the apparatus shown in Figure 1; 

Figure .4 is a table of selected data of captured packets of an entertainment 
web site using the apparatus shown in Figure 1; 

Figure 5 is a table of selected data of captured packets of the web site of 
an e-commerce merchant using the apparatus shown in Figure 1; 

Figure 6 is a table of selected data of captured packets of the web site of 
another e-commerce merchant using the apparatus shown in Figure 1; 

Figure 7 is a table of selected data of captured packets of a pornography 
web site using the apparatus shown in Figure 1; 

Figure 8 is . a table of selected data -of captured packets of another 
pornography web site using the apparatus shown in Figure 1; 

Figure 9 is a table of model N 1 results using the apparatus shown in Figure 

1; 

Figure 10 is a table of model N2 results using the apparatus shown in 
Figure 1; 
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Figure 11 is a table of model N3 results using the apparatus shown in 
Figure 1; and 

Figure 12 is a table of classification prediction confidence levels using the 
apparatus shown in Figure 1 . 

DESCRIPTION OF THE PREFERRED EMBODIMENT 
Referring initially to Figure 1 there is shown an apparatus 1 0 for classifying 
media or information flowing a path of a communications network which in this 
case is the Internet. 

As can be seen, network traffic passing through the apparatus 10 is 
captured and analysed for providing statistics relating to interactions between two 
or more terminals (not shown). The captured traffic is first checked against a list 
of predetermined classifications to determine if it is known or unknown. 

When the captured traffic is of an unknown classification, various models 
(to be described more fully below) are applied to the data set in the captured 
traffic in order to predict the content classification. The models use parameters 
derived from a knowledge base of previously classified data sets. The model of est 
fit determines the classification of the content of the newly captured traffic. Thus, 
the web site sending the captured traffic is now classified and is added to the list 
of known classifications. 

Following classification the captured data set is stored in the knowledge 
base. As the knowledge base expands, more data are used for the model 
parameters. This refines the apparatus and results in improved predictive 
performance. 



The sites that are deemed are added to Access control lists (ACLs). ACLs 
are used control the flow of content between terminals. E.g. Undesired content 
can be prevented from travelling further through the network by simply not 
forwarding it, or by replacing it, or by intercepting the request for such content 
and modifying its destination. 

Classification of traffic from content servers are relatively static. On the 
other hand, user terminals that interact with these content servers are variable and 
their classifications are considered transient classifications. 

Whereas classifications of content servers form a model of the style of 
content residing on the server, transient classifications form a model of style of 
content being viewed by a user terminal, or content consumer. This in effect 
forms a behaviour profile of such a consumer. This profile can be used to tailor 
the content to suit the consumer. 

As mentioned earlier the apparatus 10 captures a set of observed data 
relating to a network interaction event, and provides a set of results indicating the 
classification of a resource or personality residing at each network node involved 
in the interaction. This is accomplished by applying various statistical models to 
a profile, and testing this against results obtained from profiles of known 
classifications.* in this example of the invention this process is represented by the 
following formulas: 

x is an unknown profile to be classified; 

Profiles p1 ,p2,p3...pn are of known classifications; 

Models M1,M2,M3...Mn are available to operate on these profiles; and 

C1,C2,C3...Cn are profile classifications. 
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The population of a profile of classification C1, may be defined by the 
population 

of M1(p). M1(x) may be tested against the true population using standard 
statistical hypothesis methods. 

A pre-determined set of media terminals of a classification are modelled by 
various models Ml, M2 .. Mn. Each model consists of an approach and a set of 
parameter, e.g linear regression, gradient and point of interception, so that for a 
single classification M1(p1,p2 .. pn), M2(q1,q2 .. qn) .. Mn(r1,r2 .. rn) are used to 
model the population from the classification. The models may be based on 
mathematical structures, or arbitrary rules. 

The models are continually refined as more network traffic passes through 
the APPARATUS, thereby increasing the population space from which the 
classifications are computed. 

A terminal may be permanently or transitionally defined to a classification. 
A transitionally defined terminal may move between classifications based on the 
fit of the observed traffic to the models of the various classifications. 

Figures 2 to 8 are tables of selected data of traffic for to testing the profile 
of network interaction with a content server to determine if it contains media 
content of a pornographic nature. Assumption is made that- profiles for content 
servers contain a variable which is the average size of graphical images served. 

A normal distribution or similar non-deterministic probability distribution 
is then used to test the hypothesis that the profile belongs to a population 
classified as pornographic. In this example, the population of the classification 
may be defined by the population of N(a,b) where N is the image size and a and 
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b are the mean and variance respectively, based on a normal distribution. The 
average and standard deviation derived from the observed samples is tested 
against the true population using standard statistical hypothesis methods. 

In some cases this approach may be broadened to encompass analysis of 
variance methods with multiple dependant variables, to model the characteristics 
of a site. Traditional ANOVA may be applied to model the media content. 

A variety of traditional deterministic and non-deterministic models may be 
applied to determine the hypothesis of profile classification. These may be 
changed or upgraded continually depending on the level of predictive power 
found. The functionality of models used is not limited to, but can include simple 
rules-of-thumb, deterministic and non-deterministic probability models, or arbitrary 
calculations. 

The choice of model is primarily dictated by the predictive power of that 
model against the population in question. 

Figures 2 through 8 show examples of basic data set that can be gathered 
by observing network traffic of a typical interaction between a client browser and 
a web server. 

Figures 9 to 1 1 illustrate a simple classification model. This model looks 
at the size, content and relationships of objects being transmitted by a content 
server. The outcome of this model is to determine if the media being transmitted 
has pornographic content. 
Classification: pornographic 
Standard Model: 
N1(a,b) 
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Where N1 is the image size, a and b are the mean and variance respectively, 

based on a normal distribution. 

N2(c,d) 

Where N2 is the ratio of text to graphics, c and d are the total size of the text and 

graphic objects respectively. 

N3(e) 

Where N3 is the count of word patterns matched from a list of p re-determined 
words, and e is the text of an object. 

Observed Samples are given in the tables shown in Figures 2 to 8. 

For model N.I shown in Figure 9, there is applied the normal distribution 
hypothesis test to the observed samples deriving the results. 

The result shows confidence to the 93% and 87% level for sites 6 and 7 
respectively, that the sites belong to a population of pornographic sites. The other 
samples give much lower confidence levels. 

For model N2 shown in Figure 10, a simple rule is used to test if the ratio 
is below a pre-determined threshold. The results show that sites 2, 4, 6 and 7 are 
within the threshold rating. 

For Model N3 shown in Figure 11, a simple rule is used to test if the 
number of words matching.a list of patterns, exceeds a pre-determined threshold.- 
The results show that sites 6 and 7 exceed the threshold. 

A weighting formula is then applied to derive a final result as shown in 
Figure 12. 
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Therefore, using this example model, the apparatus 10 would predict that 
sites 6 and 7 are probably serving media with pornographic content, whereas sites 
1 through 5 probably are not. 

Whilst the above has been given by way of illustrative example of the 
present invention many variations and modifications thereto will be apparent to 
those skilled in the art without departing from the broad ambit and scope of the 
invention as herein set forth. 



DATED this 4th day of March 1999 

CLA I RV I CW PTY LTD O 
By their Patent Attorneys 
INTELLPRO 





Site 1 Objects - Content Type "Search Engine" 


Object 


Source 


Destination 




Size 


Timestamp 


IP Address 


Port 


IP Address 


Port 


Type 


1 


202.139.16.45 


63450 


204.71.200.72 


80 


GET 


36 


14:53:17 


2 


204.71.200.72 


80 


202.139.16.45 


63450 


text/html 


9424 


14:53:17 


3 


202.139.16.45 


63450 


204.71.200.72 


80 


GET 


79 


14:53:19 


4 


202.139.16.45 


63450 


204.71.200.72 


80 


GET 


46 


14:53:19 


5 


202.139.16.45 


63450 


204.71.200.72 


80 


GET 


50 


14:53:19 


6 


204.71.200.72 


80 


202.139.16.45 


63450 


image/gif 


2637 


14:53:19 


7 


204.71.200.72 


80 


202.139.16.45 


63450 


image/gif 


4672 


14:53:20 


8 


204.71.200.72 


80 


202.139.16.45 


63450 


image/gif 


357 


14:53:20 


9 


202.139.16.45 


63450 


204.71.200.72 


80 


GET 


56 


14:53:27 


10 


204.71.200.72 


80 


202.139.16.45 


63450 


text/html 


11193 


14:53:28 


11 


202.139.16.45 


63450 


204.71.200.72 


80 


GET 


59 


14:53:29 


12 


204.71.200.72 


80 


202.139.16.45 


63450 


image/gif 


11522 


14:53:29 


13 


202.139.16.45 


63450 


204.71.200.72 


80 


GET 


67 


14:53:30 


14 


202.139.16.45 


63450 


204.71.200.72 


80 


GET 


67 


14:53:30 


15 


202.139.16.45 


63450 


204.71.200.72 


80 


GET 


59 


14:53:30 


16 


204.71.200.72 


80 


202.139.16.45 


63450 


image/gif 


1398 


14:53:30 


17 


202.139.16.45 


63450 


204.71.200.72 


80 


GET 


76 


14:53:30 


18 


204.71.200.72 


80 


202.139.16.45 


63450 


imaae/gif 


1728 


14:53:30 


19 


202.139.16.45 


63450 


204.71.200.72 


80 


GET 


69 


14:53:30 


20 


204.71.200.72 


80 


202.139.16.45 


63450 


image/gif 


962 


14:53:30 


21 


204.71.200.72 


80 


202.139.16.45 


63450 


image/gif 


946 


14:53:31 


22 


204.71.200.72 


80 


202.139.16.45 


63450 


imaae/aif 


1716 


14:53:31 . 



FIG. 2 







Site 2 Objects - Content Type ' 


News" 






Object 
i ■ 


Source 


Destination 




Size 


Timestamp 


IP AHrlrocc 
Ir nUU(t?ob 


Pnrt 
rui 1 


IP Address 


Port 


Type 


1 


OHO 1 OQ 1 R 4C 
d\)eL. I Oy. I O.HO 




165 69 1 187 


80 


GET 


38 


14:54:04 


2 


■ICC CQ 1 1 07 

i bo. by . l • i o / 


fin 


POP 1 6 45 


63450 


text/html 


2312 


14:54:04 


3 


<iU^. l oy. i o.4o 


roach 

OOHOU 


ICC CQ 1 1 fi7 
1 UJ.Ug. (.10/ 


80 


GET 


52 


14:54:05 


4 


ono 1 OQ 1 C /IC 




Ifit; fiQ 1 1R7 


80 


GET 


56 


14:54:05 


5 


ono i OQ 1C AC 
\ OS. 1 D.40 


co/icn 

0040U 


ice cq 1 1 07 

1 DO.D*7. 1 . t O / 


80 


GET 


46 


14:54:05 


6 


<iu*:. 1 oy.l b.40 


0O4OU 


1fiC CQ 1 1 Q7 


80 


GET 


48 


14:54:05 


7 


<iU<i. i oy.i b.40 


co/ten 
D040U 


1fiC CQ -1 HQ J 
1 OO.Dv?. I.IO/ 


80 


GET 


47 


14:54:05 


8 


OOO 1 OQ 1 C AC 


DohOU 


-1 CC CQ -1 -1 Q7 
1 DO. Oi7. I.IO/ 


80 


GET 


47 


14:54:05 


9 


H CC CO H ^ Q~7 

1 bo. by. 1 .10/ 


fin 
ou 


0C\0 1 OQ 1 R A c 
C\)c.. I03. ID.HO 


coAcn 


text/html 


333 


14:54:05 


10 


H CC CO H ^ 07 

1 bo.by.l .10/ 


fin 


OCiO 1 OQ 1 R 4^ 

C-\J£.. I 03. I U.HJ 




text/html 


56 


14:54:06 


1 1 


1 bo. by. 1 .10/ 


fin 
ou 


ono 1 OQ 1 fi 


C04cn 


text/html 


2445 


14:54:06 


12 


H CC CQ ^ -1 Q7 

1 bo.by.l . 1 0 / 


fin 
ou 


0(\0 1 OQ 1 fi A c 
cXic.. \ oy. I D.40 


codcn 


text/html 


202 


14:54:06 


13 


4 cc cq ^ ^ 0*7 

1 bo.by.l .10/ 


fin 
ou 


OC\0 1 OQ 1 fi Afi, 
c\j<L. \ 0%7. I D.*tO 


R^4^n 


text/html 


202 


14:54:06 


14 


H CC CQ -i ^ Q7 

1 bo. by. 1 .10/ 


fin 
ou 


ono 1 OQ 1 R dfi 
cXjc.. I Oy. I 0.40 




text/html 


56 


14:54:06 


15 


202.1 o9.1 b.4o 


CO/1 CA 

bo4ou 


i CC CQ 1 1 Q7 
1 DO. Oy.l .10/ 


fin 

ou 


GET 


71 


14:54:06 


16 


165.69.1 .187 


on 
oU 


OQO i OQ 1C /C 
e\jc..\ oy. I D.40 


co/icn 

D040U 


image/gif 


1229 


14:54:06 


17 


OOO -1 OO H C /I C 

202.1 39.1 6.45 


cO/i en 
bo4oU 


1 CC CQ 1 1 Q7 

1 bo.oy. 1.10/ 


fin 

ou 


GET 


64 


14:54:06 


18 


1 65.69.1 .1 87 


oU 


ono 1 OQ 1 C A C 
dXjii. i Oy. 1 D.40 


co/icn 

D040U 


image/gif 


43 


14:54:06 


19 


OOO -4 OQ H C 

202.1 Jy.1 b.4o 


bo4oU 


i CC CQ 1 -l Q7 

1 do. oy. 1.10/ 


pn 
ou 


GET 


64 


14:54:06 


20 


000 ion hc /t c 

202.1 oy.l b.4o 


CO/i cn 
bo40U 


1 CC CQ 1 1 fl7 

1 bo.oy. 1.10/ 


fin 

ou 


GET 


64 


14:54:07 


21 


HCC CQ H -< O ~7 

1 65.69.1 .187 


on 
oU 


ono 1 oq 1 c /tc 
*;u^. i oy. i D.40 


DO*+OU 


image/gif 


43 


14:54:07 


22 


■i C C CQ H ■( 07 

1 6o.by.1 .10/ 


on 
OU 


ono 1 oq 1 c /ic 

c\JeL.\ oy. 1 D.40 


D040U 


image/gif 


43 


14:54:07 


23 


OOO •< OQ -1 C >1 C 

202.1 o9.1 b.4o 


bo4oU 


1 CC CQ 1 1 Q7 

1 oo.oy. 1.10/ 


fin 

ou 


GET 


67 


14:54:07 


24 


1 65.69.1 .1 87 


QQ 

oU 


ono i OQ 1C AC 
d\)iL. Toy. ID. 40 


D040U 


image/gif 


2442 


14:54:07 


25 


ono H OO H C AC 

202.1 39.1 6.45 


CO/i cn 
bo4oU 


•I CC CQ 1 1Q7 

1 bo.by.l .10/ 


fin 
ou 


GET 


73 


14:54:07 


26 


■* CC CQ ^ O ~7 

1 bb.by.l .10/ 


on 
OU 


ono 1 OQ 1 R AC 
cAJz.. I oy. I 0.4O 


DOHJU 


image/gif 


1364 


14:54:07 


27 


202.139.16.45 


63450 


165.69.1.187 


80 


GET 


71 


14:54:07 


28 


165.69.1.187 


80 


202.139.16.45 


63450 


image/gif 


8942 


14:54:07 


29 


202.139.16.45 


63450 


165.69.1.187 


80 


GET 


71 


14:54:07 


30 


202.139.16.45 


63450 


165.69.1.187 


80 


GET 


65 


14:54:08 


31 


165.69.1.187 


80 


202.139.16.45 


63450 


unknown 


10 


14:54:08 


32 


165.69.1.187 


80 


202.139.16.45 


63450 


image/gif 


15550 


14:54:08 


- 33 


202.139.16.45 


63450 


165.69.1.187 


80 


GET 


""77 


14:54:08 - 


34 


165.69.1.187 


80 


202.139.16.45 


63450 


image/gif 


4732 


14:54:08 


35 


202.139.16.45 


63450 


165.69.1.187 


80 


GET 


70 


14:54:08 


36 


202.139.16.45 


63450 


165.69.1.187 


80 


GET 


70 


14:54:09 


37 


202.139.16.45 


63450 


165.69.1.187 


80 


GET 


68 


14:54:09 


38 


202.139.16.45 


63450 


165.69.1.187 


80 


GET 


68 


14:54:09 


39 


165.69.1.187 


80 


202.139.16.45 


63450 


image/gif 


436 


14:54:09 


40 


165.69.1.187 


80 


202.139.16.45 


63450 


image/gif 


405 


14:54:09 


41 


165.69.1.187 


80 


202.139.16.45 


63450 


image/gif 


436 


14:54:09 


42 


165.69.1.187 


80 


202.139.16.45 


63450 


image/gif 


405 


14:54:09 



FIG. 3 




Site 3 Objects - Content Type "Entertainment" 




Source 


Destination 










IP AHHrocc 


Pnrt 


in nuui coo 


Pnrt 


Tv/no 

i ype 




Timestamp 


i 


POP 1 1 c A^ 




) PD4 POP 1 PQ P' 




) Ou 1 


AT 


\ 1 a -cc • 1 n 


r 


? OClA 0C\0 1 PQ 9' 


J ov. 


\ ono 1 1 




1 tOYt/html 

/ It? A U 1 1 U 1 1 1 


DC 


) \ 4.DD. 1 1 


r 


X ODO 1 OQ 1 fi A c 
I \ Oy. 1 0.4D 




1 POA PHP 1 OQ o r 




l o tz 1 


OC 


1 1 A -ac .A A 


A 


on/i ono "i oq 0' 

■ 2U4.2U2. 1 


j OL 


\ ono 1 oq 1 p. ac; 


QO*fOL 


1 tovt/html 


OC7CC 


1 4. DO. 1 2 


c 


; ono 1 oo. 1 fi. A c 


CO/1 c/- 


1 POA POP 1 OQ 0 r 


fir 


t be 1 


DC 


A A .C C • A C} 

1 4.DD.1 3 


c 
c 


: ono 1 oq 1 a c 




\ onA ono 1 oq o r * 




' \J tz 1 


DC 


1 4.DD.1 O 


7 


r ono 1 oq 1 c vie 
2U2. i oy. l 0.40 


copier 


\ onA ono 1 oq o^ 


i on 




RO 
D2 


1 /! -CC ■ 1 O 


Q 
C 


on/i ono 1 oq o' 
2U4.2U2. 1 2y.2». 


^ fir 


ono 1 OQ 1 A^ 


QO^fOU 


i m o a a / 1 a a f 

image/jpec 


CQ77 
: D2 / / 


1 4.DO.I 4 


9 


on/i ono 1 oq o*" 

t 2y.2.. 


i or 


ono 1 OQ 1 c. /i c 




irnage/gn 


/* o 

4o 


1 4. DO. 1 4 


A f> 

1 0 


on/i ono 1 oq O'" 




PHP 1 OQ 1 A^ 




image/gif 


1 occ 
l 2bb 


•4 A .CC .A A 

1 4.DD. 1 4 


■1 A 
1 1 


ono i OQ 1C /1C 

2U2. i oy. i 0.40 




onA ono 1 oq oo 


on 


VJ3 CZ 1 


CO 

bo 


i/.cc -on 
1 4.DD.2U 


A O 
1 2 


nnn w OQ -1 C A C 

202. t oy.l b.4o 


D040U 


on/i ono i oq oo 


on 


VJC 1 


c o 
Do 


A a -cc-Oi 
1 4.DD.2 1 


1 3 


000 H OQ HC /IC 

202.1 jy.1 b.4o 


co/1 en 


onA ono 1 oo oo 


on 
oU 


PT 


/2 


A A .CC'OI 

1 4.DD.21 


1 4 


oA/i ono h on o r 

204.202.1 29. 2d 


on 
oU 


ono ^ oq i c /i c 
oy. I b.4o 


co a cn 
b34ou 


image/gif 


•4 "TOO 

1 733 


^ yi .c c -OO 

1 4.55.22 


1 5 


oo vi ooo on o*" 

204.202.1 29.23 


on 
80 


ono -i on -i c 
^U<^. 1 39.1 b.4o 


co a cn 


image/gif 


C O -* A 

531 4 


4 yl.CC -O o 

1 4:55.22 


16 


oo j* ono « on o o 

204.202.129.23 


on 
80 


ono + on h c a a 

^U^.13y.1b.4b 


b34oU 


image/gif 


414 


14:55:22 


1 7 


ooo h on ■* j4 c 

202.139.16.45 


63450 


on/ ono -i on oo 
204.2:0^.1 29.23 


on 
80 


v-\ r-T- 
1 


68 


1 4.55.22 


18 


ooo oo -4 AC 

202.1 39.1 6.45 


63450 


on/i ono -<on oo 

204.202.1 29.23 


on 
80 


Gel 


o 

62 


1 4:55:22 


19 


204.202.129.23 


80 


OOO H OO HO /I C 

202.139.16.45 


63450 


image/gif 


406 


14:55:22 


20 


204.202.129.23 


80 


202.139.1 6.45 


63450 


image/gif 


746 


14:55:22 


21 


202.139.16.45 


63450 


J AAA j AA AA 

204.202.129.23 


80 


GET 


65 


14:55:23 


22 


202.139.16.45 


63450 


204.202.129.23 


80 


GET 


58 


14:55:23 


23 


202.139.16.45 


63450 


OO j* OOO -4 O /^\ oo 

204.202.129.23 


80 


GET 


63 


14:55:23 


24 


202.139.16.45 


63450 


A >«V . A A j A A AA 

204.202.129.23 


80 


GET 


62 


14:55:23 


25 


204.202.129.23 


80 


A A 4 A A J « J ^ 

202.139.16.45 


63450 


image/gif 


1665 


14:55:23 


26 


204.202.129.23 


80 


202.1 39.1 6.45 


63450 


image/gif 


35 


14:55:24 


27 


204.202.129.23 


80 


202.139.16.45 


63450 


image/gif 


906 


14:55:24 


28 


204.202.129.23 


80 


202.139.16.45 


63450 


image/gif 


447 


14:55:24 


29 


202.139.16.45 


63450 


204.202.129.23: 


80 


GET 


67 


14:55:24 


30 


202.139.16.45 


63450 


Oi O* J Oa O O* j Oj^> Oj Oj , 

204.202.129.23, 


80 


GET 


58 


14:55:24 


31 


202.139.16.45 


63450 


A ^ A /-\ A J A A A A 

204.202.129.23 


80 


GET 


62 


14:55:24 


32 


oovt ooo -4 on oo 

204.202.1 29.23 


on 

80 


ooo -< oo < o >* cr 

202.139.16.45 


63450 


image/jpec 


7861 


14:55:24 


33 


oo>4 ooo on oo 

204.202.129.23 


on 

80 


ooo -4 on -< c 

202.1 39.1 6.45 


63450 


image/gif 


on ■* 

391 


14:55:25 


34 


on>t ono h on oo 

204.202.1 29.23 


on 

80 


ono a on -4 >* r - 

202.139.16.45 


63450 


image/gif 


A A 

641 


A A .C C .OC 

1 4:55,25 


o c 

35 


OOO A OO H O y* C 

202.1 39.1 6.45 


63450 


On j( OOO -4 oo oo 

204.202.129.23 


80 


GET 


57 


H / .CC .OC 

1 4.55.25 


o O 

36 


oovi ooo h on oo 

204.202.1 29.23 


on 

80 


ono h on Ac? ac 

202.1 39.1 6.45 


63450 


image/gif 


377 


j j ,rr .OC 

14:55:25 


37 


ooo h on < o /i c 

202.139.16.45 


63450 


OO A OOO A OO OO 

204.202.1 29.23 


o o 

80 


GET 


en 

60 


A A -C C .OC 

14. od. 2b 


38 


ono a on h o /f o 


co a cn 


on/ Ono on oo 

2U4.202.1 29.23 


on 

80 


GET 


78 


A A .CC -OC 

14.DD.2b 


on 

39 


ono 1 QQ 1 R zlR 


CO/ICO 


POA POP 1 0Q OO 


on 

OU 


r" ~T 

bE I 


/4 


Jyt.CC -OC 

( 4.DD.2b 


An 

4U 


202.139.16.45 


63450 


204.202.129.23 


80 


OCT 


D / 




41 


204.202.129.23 


80 


202.139.16.45 


63450 


mage/gif 


403 


14:55:26 


42 


204.202.129.23 


80 


202.139.16.45 


63450 


mage/gif 


1796 


14:55:26 


43 


204.202.129.23 


80 . 


202.139.16.45 


63450 


mage/gif 


6845 


14:55:27 


44 . 


202.139.16.45 


63450 : 


504.202.129.23 


80 < 


GET 


56 


14:55:27 


45 ; 


504.202.129.23 


80 ; 


502.139.16.45 


63450 i 


mage/jpec 


17796 


14:55:27 


46 ; 


502.139.16.45 


63450 : 


204.202.1 29.23 


80 < 


3ET 


56 


14:55:27 


47 : 


?04.202. 129.23 


80 I 


?02.139.16.45 


63450 i 


mage/gif 


49 


14:55:27 


48 : 


?04.202. 129.23 


80 c 


202.139.16.45 


63450 i 


mage/gif 


44 


14:55:27 



FIG. 4 



Object 





Source 


Destination 




IP Address 


Port 


IP Address 


Port 


1 


202.139.16.45 


63450 


209.143.240.6 


80 


2 


209.143.240.6 


80 


202.139.16.45 


63450 


3 


202.139.16.45 


63450 


209.143.240.6 


80 


4 


209.143.240.6 


80 


202.139.16.45 


63450 


5 


202.139.16.45 


63450 


209.143.240.6 


80 


6 


202.139.16.45 


63450 


209.143.240.6 


80 


7 


202.139.16.45 


63450 


209.143.240.6 


80 


8 


209.143.240.6 


80 


202.139.16.45 


63450 


9 


202.139.16.45 


63450 


209.143.240.6 


80 


10 


209.143.240.6 


80 


202.139.16.45 


63450 


1 1 


209.143.240.6 


80 


202.139.16.45 . 


63450 


12 


202.139.16.45 


63450 


209.143.240.6 


80 


13 


202.139.16.45 


63450 


209.143.240.6 


80 


14 


209.143.240.6 


80 


202.139.16.45 


63450 


15 


202.139.16.45 


63450 


209.143.240.6 


80 


16 


202.139.16.45 


63450 


209.143.240.6 


80 


17 


209.143.240.6 


80 


202.139.16.45 


63450 


18 


209.143.240.6 


80 


202.139.16.45 


63450 


19 


202.139.16.45 


63450 


209.143.240.6 


80 


20 


202.139.16.45 


63450 


209.143.240.6 


80 


21 


209.143.240.6 


80 


202.139.16.45 


63450 


22 


202.139.16.45 


63450 


209.143.240.6 


80 


23 


209.143.240.6 


80 


202.139.16.45 


63450 


24 


209.143.240.6 


80 


202.139.16.45 


63450 


25 


209.143.240.6 


80 


202.139.16.45 


63450 


26 


202.139.16.45 


63450 


209.143.240.6 


80 


27 


209.143.240.6 


80 


202.139.16.45 


0040U 




202.139.16.45 


63450 


209.143.240.6 


80 


29 


202.139.16.45 


63450 


209.143.240.6 


80 


30 


202.139.16.45 


63450 


209.143.240.6 


80 


31 


209.143.240.6 


80 


202.139.16.45 


63450 


32 


202.139.16.45 


63450 


209.143.240.6 


80 


33 


209.143.240.6 


80 


202.139.16.45 


63450 


34 


209.143.240.6 


80 


202.139.16.45 


63450 


35 


202.139.16.45 


63450 


209.143.240.6 


80 


36 


209.143.240.6 


80 


202.139.16.45 


63450 


37 


202.139.16.45 


63450 


209.143.240.6 


80 


38 


209.143.240.6 


80 


202.139.16.45 


63450 


39 


202.139.16.45 


63450 


209.143.240.6 


80 


40 


209.143.240.6 


80 


202.139.16.45 


63450 


41 


202.139.16.45 


63450 


209.143.240.6 


80 


42 


209.143.240.6 


80 


202.139.16.45 


63450 



Type 

GET 

text/html 

GET 

text/html 

GET 

GET 

GET 

image/gif 

GET 

image/gif 

image/gif 

GET 

GET 

image/gif 

GET 

GET 

image/gif 

image/gif 

GET 

GET 

image/gif 

GET 

image/gif 

image/gif 

image/gif 

GET 

image/gif 

GET 

GET 

GET 

image/gif 

GET 

image/gif 

image/gif 

GET 

image/gif 

GET 

image/gif 

GET 

image/gif 

GET 

image/gif 



Size 





i irnesiarnp 


O A 

o4 
1 645 


-f A .CC-/1Q 

1 4.OO.0U 


A ~7 

47 


1 4.oo.5 1 


56 


14.ob.oi 


64 


14:55.oo 


63 


j j ,r r .CO 

14:55.53 


63 


4 A .CC .CO 

14:55:53 


290 


•fi4.CC .CO 

14:oo.oo 


64 


j j ,rr .c A 

14:55:54 


403 


i j ,CC .C A 

14:55.54 


381 


14:55:54 


63 


14:55:54 


65 


14:55:54 


348 


14:55:54 


65 


14:55:54 


65 


14:55:54 


354 


14:55:54 


600 


14:55:54 


65 


14:55:55 


54 


14:55:55 


490 


14:55:55 


53 


14:55:55 


571 


14:55:55 


322 


14:55:55 


571 


14:55:55 


55 


14:55:55 


363 


14:55:55 


63 


14:55:55 


63 


14:55:55 


62 


14:55:55 


241 


14:55:56 


59 


14:55:56 


488 


14:55:56 


463 


14:55:56 


59 


14:55:56 


714 


14:55:56 


53 


14:55:56 


35 


14:55:56 


53 


14:55:56 


1188 


14:55:56 


54 


14:55:56 


327 


14:55:57 



FIG. 5 
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FIG. 8 
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FIG. 9 



Model N2 


Threshhold 




Weight 


0.1 


20 




Site 


Text 


Graphics 


Ratio 


1 


20617 


25938 




0.79 


2 


6402 


68319 




0.09 


3 


25821 


54455 




0.47 


4 


1701 


67037 




0.03 


5 


19019 


17252 




1.10 


6 


9869 


102919 




0.10 


7 


19446 


696989 




- 0.03 
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